Data Mining And Data Warehousing
k-medoids clustering algorithm
K-medoids
K-Medoids clustering algorithm similar to K-Means, but instead of using means (averages) to define cluster centers, it uses medoids — actual data points that are most centrally located within a cluster. This makes K-Medoids more robust to noise and outliers than K-Means.
Working of K-medoids
Initialize:
- Randomly choose k data points as initial medoids (actual representative data point).
- Assign:
- Assign each data point to the nearest medoid (using a distance metric).
- Update:
- For each medoid, try swapping it with a non-medoid point and check if the total cost (sum of distances) decreases.
- If yes, perform the swap.
- Repeat:
Advantages
- Robust to outliers since it uses actual data points.
- Works with arbitrary distance metrics (Euclidean, Manhattan, etc.).
- Better than K-Means when data has categorical or mixed types.
Disadvantages
- Slower than K-Means, especially on large datasets (because of pairwise comparisons).
- Still requires k to be known beforehand.
- Not ideal for very high-dimensional data unless optimized.